Model-Free Monte Carlo-like Policy Evaluation

نویسندگان

  • Raphaël Fonteneau
  • Susan A. Murphy
  • Louis Wehenkel
  • Damien Ernst
چکیده

We propose an algorithm for estimating the finite-horizon expected return of a closed loop control policy from an a priori given (off-policy) sample of one-step transitions. It averages cumulated rewards along a set of “broken trajectories” made of one-step transitions selected from the sample on the basis of the control policy. Under some Lipschitz continuity assumptions on the system dynamics, reward function and control policy, we provide bounds on the bias and variance of the estimator that depend only on the Lipschitz constants, on the number of broken trajectories used in the estimator, and on the sparsity of the sample of one-step transitions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Error Bounds in Reinforcement Learning Policy Evaluation

With the advent of Kearns & Singh’s (2000) rigorous upper bound on the error of temporal difference estimators, we derive the first rigorous error bound for the maximum likelihood policy evaluation method as well as deriving a Monte Carlo matrix inversion policy evaluation error bound. We provide, the first direct comparison between the error bounds of the maximum likelihood (ML), Monte Carlo m...

متن کامل

Loss of Load Expectation Assessment in Deregulated Power Systems Using Monte Carlo Simulation and Intelligent Systems

Deregulation policy has caused some changes in the concepts of power systems reliability assessment and enhancement. In this paper, generation reliability is considered, and a method for its assessment using intelligent systems is proposed. Also, because of power market and generators’ forced outages stochastic behavior, Monte Carlo Simulation is used for reliability evaluation. Generation r...

متن کامل

Development and implementation of a Monte Carlo frame work for evaluation of patient specific out- of - field organ equivalent dose

Background: The aim of this study was to develop and implement a Monte Carlo framework for evaluation of patient specific out-of-field organ equivalent dose (OED). Materials and Methods: Dose calculations were performed using a Monte Carlo-based model of Oncor linac and tomographic phantoms. Monte Carlo simulations were performed using EGSnrc user codes. Dose measurements were performed using r...

متن کامل

Factoring Exogenous State for Model-Free Monte Carlo

Policy analysts wish to visualize a range of policies for large simulator-defined Markov Decision Processes (MDPs). One visualization approach is to invoke the simulator to generate on-policy trajectories and then visualize those trajectories. When the simulator is expensive, this is not practical, and some method is required for generating trajectories for new policies without invoking the sim...

متن کامل

Monte Carlo Study of the Effect of Backscatter Materail Thickness on 99mTc Source Response in Single Photon Emission Computed Tomography

Introduction SPECT projections are contaminated by scatter radiation, resulting in reduced image contrast and quantitative errors. Backscatter constitutes a major part of the scatter contamination in lower energy windows. The current study is an evaluation of the effect of backscatter material on FWHM and image quality investigated by Monte Carlo simulation. Materials and Methods SIMIND program...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010